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Abstract 

Background: Betaretroviruses infect a wide range of species including primates, rodents, ruminants, and marsupials. 
They exist in both endogenous and exogenous forms and are implicated in animal diseases such as lung cancer in 
sheep, and in human disease, with members of the human endogenous retrovirus-K (HERV-K) group of 
endogenous betaretroviruses (pTRVs) associated with human cancers and autoimmune diseases. To improve our 
understanding of betaretroviruses in an evolutionarily distinct host species, we characterized (3ERVs present in the 
genomes and transcriptomes of mega- and microbats, which are an important reservoir of emerging viruses. 

Results: A diverse range of full-length (3ERVs were discovered in mega- and microbat genomes and transcriptomes 
including the first identified intact endogenous retrovirus in a bat. Our analysis revealed that the genus 
Beta retrovirus can be divided into eight distinct sub-groups with evidence of cross-species transmission. 
Betaretroviruses are revealed to be a complex retrovirus group, within which one sub-group has evolved from 
complex to simple genomic organization through the acquisition of an env gene from the genus Gammaretrovirus. 
Molecular dating suggests that bats have contended with betaretroviral infections for over 30 million years. 

Conclusions: Our study reveals that a diverse range of betaretroviruses have circulated in bats for most of their 
evolutionary history, and cluster with extant betaretroviruses of divergent mammalian lineages suggesting that their 
distribution may be largely unrestricted by host species barriers. The presence of pTRVs with the ability to transcribe 
active viral elements in a major animal reservoir for viral pathogens has potential implications for public health. 
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Background 

Retroviruses (family Retroviridae) are a diverse and widely 
distributed family of RNA viruses distinguished by their 
use of a viral RNA-dependent DNA polymerase (reverse 
transcriptase; RT) and ability to integrate into the ge- 
nomes of their cellular hosts [1]. In addition to the exist- 
ence of infectious viral particles that are horizontally 
transmitted between hosts (exogenous retroviruses), the 
capacity of retroviruses to integrate into the host germline 
also generates vertically transmissible endogenous retro- 
viruses (ERVs) [1,2]. ERVs may or may not be capable 
of producing infectious viral particles, and germline 
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integration over the course of multiple generations typic- 
ally leads to the accumulation of mutations that render 
them defective and non-functional [2] . 

The retroviral family is composed of seven gene- 
ra: Alpharetrovirus, Betaretrovirus, Gammaretrovirus, 
Deltaretrovirus, Epsilonretrovirus, Lentivirus, and 
Spumavirus [3]. The genomic organization of retrovi- 
ruses is classified as either 'simple' or 'complex', with 
simple retroviruses encoding the structural poly- 
proteins Gag and Env, and the functional polyproteins 
Pro and Pol [4]. Complex retroviruses encode additional 
accessory and regulatory proteins with diverse functions 
that typically establish and maintain virus replication 
and pathogenesis [5]. The core elements of all retrovi- 
ruses are flanked by a pair of typically untranslated nu- 
cleotide regions at their 5 ' and 3 ' ends. In the provirus, 
formed by integration of the viral cDNA into the host 
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cell chromosome, these regions are referred to as 'long 
terminal repeats' (LTR) [4] . 

Exogenous retroviruses of zoonotic origin have been 
associated with disease in humans, the most notable be- 
ing human immunodeficiency virus (HIV) [6] . Other ret- 
roviruses such as human foamy virus (HFV) and human 
T-cell leukemia virus (HTLV) are known to be capable 
of infecting humans [7,8]. The retroviruses most recently 
associated with human disease are betaretroviruses. The 
up-regulation of gene products derived from the human 
endogenous retrovirus-K (HERV-K) group of betare- 
troviruses has been linked to a diverse range of cancers 
such as those of the breast, ovaries, and prostate along- 
side other significant human maladies [9,10]. 

The genus Betaretrovirus consists of the Type B and 
Type D groups of exogenous and endogenous retrovi- 
ruses and the HERV-K group of endogenous retrovi- 
ruses. Among the exogenous, infectious members of the 
genus are the Type B Mouse mammary tumour virus 
(MMTV), the Type D Jaagsiekte sheep retrovirus (JSRV), 
which causes pulmonary carcinoma in sheep, and the 
Type D Mason-Pfizer monkey virus (MPMV) which 
causes wasting and immunosuppression in new-born 
Rhesus monkeys [11-13]. All betaretroviruses utilize var- 
iants of the lysine tRNA primer binding site (PBS) and 
encode a deoxyuridine triphosphatase (dUTPase), within 
their pro gene which functions as a nucleocapsid- 
dUTPase fusion protein [14-16]. Type B and Type D 
betaretroviruses differ in several respects including their 
complement of accessory factors, virion morphology, 
strategies for RNA nuclear export, and the length of 
their LTR regions. Type B betaretroviruses contain 
spherical viral cores and have LTRs of -1,200 nucleo- 
tides while Type D contain cylindrical viral cores and 
have LTRs of -300 nucleotides. The prototypical Type B 
betaretrovirus, MMTV, encodes the accessory proteins 
regulator of export of MMTV mRNA (Rem) and nega- 
tive acting factor (Naf), which have roles in viral mRNA 
export, protein synthesis and gene expression [17-19], in 
addition to the virulence factor, superantigen (Sag) [20]. 
The Type D retrovirus JSRV has been shown to encode 
the fraws-acting factor Rej which has a role in protein 
synthesis and may assist RNA nuclear export [21]. While 
no distinct oncogenes or Sag-like virulence-associated 
proteins are known to be encoded by Type D betare- 
troviruses, the Env protein of JSRV is associated with 
oncogenesis [13,14]. 

There are two major strategies employed by betare- 
troviruses to export unspliced or partially spliced viral 
RNA from the nucleus that use distinct export pathways. 
Complex betaretroviruses such as MMTV employ a HIV 
Rev-like accessory protein encoded within the env gene 
that binds and facilitates export of intron containing retro- 
viral RNA by recruitment of the cellular karyopherin 



export factor, chromosome region maintenance 1/exportin 
1 (Crml/Xpol) [17,19]. Simple betaretroviruses such as 
MPMV contain a constitutive transport element (CTE) 
within the nucleotide sequence at the 3 ' end of the retro- 
viral genome that recruits a cellular binding factor, Tap 
(nuclear RNA export factor 1; NXF1) which mediates nu- 
clear export [22,23]. 

Importantly, ERVs provide a unique opportunity to 
study the evolutionary history of this family of viruses as 
they are essentially genetic 'fossils' of past retroviral in- 
fections [2,24]. As such, their existence serves as an indi- 
cation of the potential host range of a given retroviral 
lineage and may be interpreted as evidence for the pos- 
sible existence of exogenous retroviruses that have yet to 
be isolated. Indeed, previous studies have reported a 
number of endogenous betaretroviruses (pERVs) in spe- 
cies for which no exogenous betaretrovirus has yet been 
identified. These include mammalian species as diverse 
as primates, horses, rats, lemurs, and an Australian mar- 
supial, the common brushtail possum [25-27]. 

There are over 1,100 known species of bats (order 
Chiroptera), accounting for approximately 20% of all mam- 
malian species [28]. Bats are relatively divergent from other 
mammals, having branched off from the Perissodactyla 
(containing horses) approximately 88 million years ago 
(mya) [29]. They are divided into two major groups: 
megabats (suborder Megachiroptera) which are mainly 
fruit-eating, and microbats (suborder Microchiroptera), 
small insectivores that navigate by means of echolo- 
cation [30]. Notably, bats harbour over 100 viral 
species from a diverse range of virus families includ- 
ing the Paramyxoviridae, Coronaviridae, Herpesviridae, 
Rhabdoviridae, Arenaviridae, Togaviridae, Flaviviridae, 
Orthomyxoviridae, Reoviridae, Bunyaviridae, Filoviridae, 
and Picornaviridae [31]. Bats, belonging to the mamma- 
lian superorder Laurasiatheria, are a major viral reservoir 
that is evolutionarily distinct from another major viral res- 
ervoir, rodents, which together with primates belong to 
the superorder Euarchontoglires [29,32] . 

Bats have recently gained attention as they have been 
implicated in numerous newly emerging diseases of 
humans caused by viruses such as SARS-coronavirus, 
Hendra virus, Nipah virus, and the Ebola virus [33-35]. 
This track record of zoonotic transmission of previously 
unknown viral pathogens from bats to humans has 
prompted calls for a proactive approach to future emer- 
ging diseases originating in bats [30]. To this end a nat- 
ural history survey of bats has begun, and we have 
recently reported the discovery of diversified defective 
endogenous gammaretroviruses in both mega- and 
microbats [36,37]. 

Previous studies of (3ERVs have tended to focus on iso- 
lated viruses, although a report on the |3ERVs of murid 
hosts indicated that the genus Betaretrovirus might 
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possess a diverse and previously unrecognized range 
of sub-types extending beyond the classical Type B/Type 
D paradigm [25]. Using transcriptome and genome ana- 
lyses of the megabats Pteropus alecto (black flying fox) 
and Pteropus vampyrus (large flying fox), and the 
microbats Myotis lucifugus (little brown bat), Rhinolophus 
megaphyllus (eastern horseshoe bat), and Rhinolophus 
ferrumequinum (greater horseshoe bat), we herein exam- 
ine |3ERVs present in a diverse range of bat species. In 
conjunction with phylogenetic analyses, we incorporated 
the diversity of genomic organizations and the use of spe- 
cific lysine tRNA PBS to identify eight distinct groups of 
betaretroviruses. 

Results 

(3ERVs in bat transcriptomes 

To determine if bats contained and expressed a full suite 
of integrated endogenous betaretroviral genes we gener- 
ated and analyzed transcriptome databases of P. alecto, 
R. megaphyllus, and R. ferrumequinum. Gag, Pol, and 
Env protein sequences were translated from the ge- 
nomes of extant betaretroviruses: MMTV, JSRV, MPMV, 
squirrel monkey retrovirus (SMR), and simian retrovirus 
(SRV). Local tBLASTn searches were conducted to de- 
termine if the transcriptomes contained nucleotide se- 
quences that, when translated into any of their six 
reading frames, contained significant protein sequence 
similarity to the betaretoviral protein query sequences. 



Because the variation in length between different tran- 
scripts causes difficulty when interpreting relatedness if 
similarity is expressed as a percentage identity, the sig- 
nificance of the similarity levels observed was deter- 
mined on the basis of the e-value (probability of random 
sequence identity) of the BLAST hits. Each transcrip- 
tome was found to contain mRNA sequences with not- 
able similarity (e-values < lxlO" 10 ) to the betaretroviral 
proteins Gag, Pol, and Env, with the exception of 
the R. ferrumequinum transcriptome in which no beta- 
retroviral gag-like transcripts were identified (Table 1). 

Reciprocal BLASTx searches of the transcript hits 
with the lowest e-values (i.e. the top hits presented in 
Table 1) against the NCBI non-redundant protein data- 
base returned predominantly betaretroviral hits. The 
majority of the mRNA sequences identified within the 
bat transcriptomes were partial, not being of sufficient 
length to reveal an entire gag, pol, or env gene sequence. 
As a point of reference, the nucleotide sequence lengths 
of MPMV gag, pol, and env are 1,974, 2,583, and 1,758, 
respectively, while the majority of the transcripts identi- 
fied in the BLAST analyses were < 1,000. The P. alecto 
transcriptome was found to contain two retroviral tran- 
scripts 5,433 and 5,830 nucleotides in length which 
overlap each other by 3,152 bases with 100% sequence 
identity. The extent of overlap and perfect identity indi- 
cated that the two sequences likely represented a full- 
length retroviral genomic sequence >8,103 bases in 



Table 1 Betaretroviral elements in mega- and microbat transcriptomes 







P. alecto 




R. megaphyllus 


R. ferrumequinum 


Lowest a 
e-value 


# Hits b 


Lowest 
e-value 


# Hits 


Lowest 
e-value 


# Hits 


Betaretroviruses JSRV 


Gag 


1.65x10" 121 


150 


3.86x1 0" 28 


9 


ND 


0 




Pol 


<1x10- 250 


246 


7.80x1 0" 36 


16 


1.0x10~ 58 


1 




Env 


1.29X1CT 51 


48 


4.00x1 0" 1 5 


3 


ND 


0 


SMR 


Gag 


1.75x10~ 59 


185 


1.31x10~ 15 


3 


ND 


0 




Pol 


<1x10- 250 


241 


2.1 3x1 0" 40 


5 


7.0x1 0" 56 


1 




Env 


2.58x1 0" 31 


137 


2.65x1 0" 31 


2 


1.0x10~ 8/ 


1 


MPMV 


Gag 


2.1 6x1 0- 104 


190 


1.49x10~ 20 


5 


ND 


0 




Pol 


<1x10- 250 


287 


7.71x10~ 34 


21 


1.0x10~ 58 


1 




Env 


1.83X1CT 33 


140 


1.48x10~ 31 


6 


2.0x1 0" 98 


1 


MMTV 


Gag 


6.85X10" 53 


90 


4.82x1 0" 13 


2 


ND 


0 




Pol 


<1x10~ 250 


269 


1.36x10~ 31 


15 


2.0x1 0" 45 


1 




Env 


8.39x1 0" 54 


19 


ND 


0 


ND 


0 


SRV 


Gag 


1.70X10" 108 


185 


2.96x1 0" 20 


5 


ND 


0 




Pol 


<1x10~ 250 


290 


9.34x1 0" 39 


21 


9.0x1 0" 62 


1 




Env 


1.31x10~ 35 


136 


2.75x1 0" 26 


2 


1.0X10- 109 


1 



a Gag, Pol, and Env proteins were translated from the genomes of extant betaretroviruses and used as search queries in a tBLASTn analysis of the lllumina 
sequenced transcriptome of P. alecto, and the 454 sequenced transcriptomes of ft megaphyllus, and ft. ferrumequinum. The e-value of the transcriptome hit with 
the greatest sequence similarity (lowest e-value) to each query sequence is displayed, 
^he number of transcripts identified in the transcriptomes with an e-value < 1 x 10" 10 . ND: No data. 
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length that was later determined through phylogenetic 
analysis to be a pERV. This full-length P. alecto pERV gen- 
omic transcript was named PaERV-|3A (Pteropus alecto 
Endogenous Retrovirus - betaretrovirus A) (Figure 1). In 
addition to PaERV-|3A, different transcripts covering the 
length of one distinct betaretroviral pol transcript (PaPol- 
01) most closely related to JSRV pol and one env transcript 
(PaEnv-01) similar to Type C gammaretrovirus and 
MPMV-like Type D betaretrovirus env were identified in 
the P. alecto transcriptome. A single transcript (RfEnv-01) 
covering the length of a gammaretrovirus-like env gene 
most similar to RD114 env was identified in the 
R. ferrumequinum transcriptome. These transcripts were 
incorporated into the subsequent phylogenetic analyses. 

The PaERV-|3A sequence was found to begin 25 nucle- 
otides upstream of the gag start methionine and contains 
all of the expected core retroviral genes along with the 
betaretroviral dUTPase domain (Figure 1). All of the 
genes were found to be defective as they each contained 
frameshift mutations. In addition, the pol and env genes 
contained premature stop codon mutations. Identifica- 
tion of a 19 nucleotide polypurine tract (PPT) allowed 
the delineation of the beginning of the unique 3' 
(U3) region. Conserved retroviral active site motifs 
were present in the protease (DxG), reverse transcrip- 
tase (DDD), and integrase (DDE) domains. The major 
homology region (MHR; nucleotide coordinates 1,456 - 
1,496) and zinc fingers (nucleotide coordinates 1,752 - 
1,805 and 1,866 - 1,919) conserved in gag were also 
present. 

Two additional ORFs were identified; the first overlaps 
the 3 ' end of the pro gene while the second overlaps the 
U3 region. However, protein translations of the ORFs 
compared to the publicly accessible protein family 
(Pfam) database revealed no known protein domains. In 
addition, BLASTp analysis of the translations against the 
NCBI non-redundant protein database yielded no hits. 
Later identification of closely related P. vampyrus |3ERVs 
(PvERV-|3J and PvERV-|3K) indicated that the ORF over- 
lapping the U3 region was not legitimate. 



PERVs in bat genomes 

Given the successful identification of betaretrovirus-like 
nucleotide sequences in the transcriptomes, we sought to 
mine the publicly available genomes of P. vampyrus and 
M. lucifugus for full-length endogenous betaretroviruses. 
The aforementioned extant betaretoviral protein se- 
quences together with the retroviral mRNA sequences 
identified in the bat transcriptomes were used to conduct 
tBLASTn and tBLASTx searches on the P. vampyrus and 
M. lucifugus genomes. These searches revealed a number 
of hits in the genomes that contained betaretroviral gag, 
pol, and env genes. Full-length ERVs were delineated by 
the identification of retroviral gag, pol, and env sequences 
positioned next to each other and located between a pair 
ofLTRs. 

In total, we identified 11 full-length (BERVs in 
P. vampyrus and six in M. lucifugus (Table 2). These bat 
|3ERVs contain all of the expected core elements and the 
betaretrovirus-specific dUTPase domain. As retroviruses 
were previously categorized based on the specific tRNA 
that anneals to their PBS required for initiation of reverse 
transcription, we determined the specific tRNA used by all 
identified bat |3ERVs through nucleotide alignment with 
known mammalian lysine tRNA sequences (Additional 
file 1: Figure SI). The PBS was intact and could be identi- 
fied in the majority of the bat |3ERVs, and all but one 
(M1ERV-|3E) was found to harbour a PBS complementary 
to either tRNA lysine 1,2 (Lys 1,2) or tRNA lysine 3 (Lys 
3) typical of betaretroviruses. Reciprocal BLASTp searches 
confirmed that the Gag, Pol, and Env of these full-length 
ERVs were more similar to known betaretroviral proteins 
than those of other retroviral genera with Pol sequence 
similarities ranging from 64% to 76% (Additional file 2: 
Table SI). 

All of the bat |3ERVs possessed LTRs of 300-500 nucleo- 
tides in length, as expected for Type D betaretroviruses 
with the exception of PvERV-|3B with LTR length typical 
of Type B betaretrovirues (1265 bp) (Table 2). Each bat 
[3ERV was found to contain a PPT immediately upstream 
of their 3' LTR regions. We analyzed each pro and pol 



dUTPase DxG 



PaERV-pA 



5'- 



<^W 1 DDD 

ORF 1 



^DE 







PPT 
I ORF* 



3' 



Figure 1 A schematic representation of PaERV-pA. Two transcripts were identified in the P. alecto lllumina sequenced transcriptome that 
overlapped by 3,152 nt with 100% sequence identity which were used to assemble the PaERV-pA genomic sequence. Indicated are the retroviral 
genes gag, pro, pol, and env, which have been rendered defective by random mutation since integration. Also shown are the key enzymatic 
active sites of the viral protease (DxG), reverse transcriptase (DDD), and integrase (DDE); the betaretroviral dUTPase domain in pro; two unique 
open reading frames (ORFs); the polypurine tract (PPT); and the (Unique 3') (U3) region. ORF* does not appear to be genuine, but rather has 
arisen as a result of an insertion mutation that has disrupted a stop codon. 
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Table 2 Full-length endogenous betaretroviruses identified in the I 
the Sanger sequenced genomes of P. vampyrus and M. lucifugus 



lumina sequenced transcriptome of P. alecto and 















Extra 


LTR 








Genome 


gag" 


pro c 


pol 


env 


ORFs d 


Length 6 


PBS f 


Additional notes 




Size 3 (nt) 










> 300 nt 


(nt) 






P. vompyrus 




















PvERV-(3A 


7,705 


Defective 


Defective 


Defective 


Defective 


0 


407* 


Unknown 


100 nt NSR overlapping 5' LTR and 
beginning of gag gene 


PvERV-(3B 


9,257 


Defective 


Intact 


Intact 


Defective 


1 


1265 


Lys 3 


102 nt NSR within gag gene 


PvERV-pC 


7,126 


Defective 


Defective 


Defective 


Intact 


o 


366* 


Lys 3 


Short env gene may indicate in-frame 

deletion 


PvERV-pD 


7,928 


Defective 


Defective 


Defective 


Defective 


1 


398 


Lys 1,2 


NSRs overlapping 5' LTR and pro-pol 
junction 


PvERV-(3E 


7,879 


Intact 


Defective 


Intact 


Intact 


1 


371* 


Lys 3 


A single stop mutation in pro prevents this 
ERV being intact 


PvERV-PF 


7,804 


Intact 


Defective 


Defective 


Defective 


1 


370 


Lys 3 


41 nt NSR at extreme 5' end of the 5' LTR 


PvERV-pG 


7,631 


Defective 


Intact 


Defective 


Defective 


0 


387* 


Lys 3 


Appears to contain a deletion that 
overlaps PPT and 3 'LTR 


PvERV-pH 


7,843 


Defective 


Intact 


Defective 


Defective 


1 


361 


Lys 3 




PvERV-pi 


7,809 


Defective 


Defective 


Defective 


Defective 


0 


371* 


Lys 3 




PvERV-pj 


8,773 


Defective 


Intact 


Intact 


Intact 


2 


427* 


Lys 1,2 




PvERV-pK 


8,61 1 


Defective 


Defective 


Intact 


Intact 


1 


425* 


Lys 1,2 


3' LTR appears truncated 


P. alecto 




















PaERV-pA 


>8,103 § 


Defective 


Defective 


Defective 


Defective 


2 


Unknown 


Unknown 


Contains artifact ORF (denoted as ORF* in 
Figure 1) 


M. lucifugus 




















MIERV-PA 


9,866 


Defective 


Intact 


Defective 


Defective 


0 


422* 


Lys 1 ,2 


Large foreign insertion in 5' LTR 


MIERV-PB 


8,121 


Unknown 


Defective 


Intact 


Defective 


0 


480 


Lys 3 


669 nt NSR within gag gene 


MIERV-PC 


8,102 


Intact 


Intact 


Intact 


Intact 


0 


479* 


Lys 3 


Completely intact 


MIERV-PD 


9,007 


Defective 


Defective 


Defective 


Intact 


0 


479* 


Lys 3 


Contains short foreign insertions in pro 
and pol genes 


MIERV-PE 


7,890 


Defective 


Defective 


Defective 


Defective 


1 


440 


Lys + 




MIERV-PF 


8,235 


Intact 


Intact 


Defective 


Defective 


1 


-1/0 


Lys 3 


Small ~45nt deletion overlapping pol and 
env genes 



a The genome size is given for the proviral version of the fiERVs. The genome size of PaERV-fiA is uncertain as the known sequence begins 25nt upstream of the 
gag gene and does not include the (unique 5'} region. 

b The core retroviral genes gag, pro, pol, and env that contain frameshift or premature stop mutations are described as 'defective', those that contain neither of 
these are described as 'intact' in bold font. 

c The pro open reading frame (ORF) of each BERV was found to encode a betaretroviral dUTPase protein domain. 
d The number of ORFs that do not code for the core genes and are 300 nucleotides or greater in length. 

e The length of the long terminal repeats (LTRs). * For those BERVs whose 5' and 3' LTR lengths differ, the value of the 5' LTR is given. 

f The specific lysine (Lys) tRNA complementary to the primer binding site (PBS) for each BERV is given. + The specific identity of the PBS of MIERV-BE is uncertain. 
NSR: non-sequenced region. 



gene and identified the expected enzymatic active site mo- 
tifs in the retroviral protease (DxG), reverse transcriptase 
(DDD), and integrase (DDE) domains. The gag gene of each 
|3ERV contained the expected MHR and zinc-knuckles. 
While the M. lucifugus genome sequencing coverage was 
relatively high (7x coverage), the P. vampyrus genome 
has only been sequenced to 2.6x coverage. The nature of a 
low-coverage genome such as this means that within the as- 
sembled 'scaffolds' there occasionally exist stretches of nu- 
cleotides of ambiguous identity. In this regard, several of the 



bat (BERVs reported herein contain short 'non-sequenced re- 
gions' (NSR) (Table 2). As a result, the PBS present in 
PvERV-pA and the MHR of PvERV-(3B could not be identi- 
fied as they contained NSRs overlapping those elements. To 
confirm that each (3ERV was the product of a retroviral inte- 
gration event, the four-nucleotide repeats known as gen- 
omic target site duplication (TSD) sequences that flank the 
proviruses were identified (Additional file 2: Table S2). TSDs 
were identified for all proviral |3ERVs with the exception of 
PvERV-|3D and F whose 5' LTRs were masked by NSR, 
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PvERV-|3K whose 3' LTR appears to be truncated, and 
PvERV-[3B which is the sole (3ERV to have intact and unam- 
biguous LTRs yet no identifiable TSDs. To determine if 
closely related clusters of pERVs were generated as a result 
of post-integration chromosomal duplication events, we 
compared their flanking chromosomal DNA through a 
BLASTn analysis (Additional file 2: Table S3). One pair of 
bat |3ERVs (PvERV-(3K and PvERV-(3J) was found to have 
homology in the chromosomal regions immediately up- and 
downstream of the proviruses. PvERV-pT< and PvERV-|3J 
appear to have arisen as a result of a duplication of a single 
integrated provirus. The truncation of the 3' LTR of 
PvERV-|3K suggests a chromosomal duplication event. 

Phylogenetic analysis of betaretroviral Gag, Pol and Env 
elements 

Next, we examined the phylogenetic relationships of the 
bat PERVs identified in our analysis of the bat genomes 



and transcriptomes (Table 2). Accordingly, the Gag, Pol, 
and Env of the full-length bat pERVs were aligned 
with those of known exogenous and endogenous 
betaretroviruses and phylogenetic trees were estimated 
for each (Figure 2). 

In all three trees a great diversity of bat (BERVs was ob- 
served, with individual pERVs clustering with members 
of the Type D (e.g. MPMV and JSRV), Type B (e.g. 
MMTV), and HERV-K groups. The close relationship 
between viral sequences derived from transcriptomes 
and some endogenous viral sequences mined from bat 
genomes suggests that at least some of the bat |3ERVs 
have the ability to transcribe. Notably, a number of bat 
|3ERVs (PvERV-|3J, K and PaERV-pA), together with sev- 
eral exogenous betaretroviruses, were found to possess 
Env sequences that formed a cluster so highly divergent, 
and more closely related to gammaretroviruses, as to re- 
quire omission from the initial betaretroviral Env tree 
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Figure 2 Phylogenetic relationships of bat and non-bat betaretroviruses. Maximum likelihood phylogenetic trees of (A) Gag, (B) Pol, and (C) 
Env amino acid sequences. Bootstrap values <70% are not shown, and branch lengths are drawn to a scale of amino acid substitutions per site. 
Bootstrap values are denoted as ** >90%; * >70% and < 90%. The trees are midpoint rooted for purposes of clarity only. pERV proteins of 
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highlighted with a grey background (y-Env) contain betaretroviruses whose Env sequence is not sufficiently closely related to the Env of other 
betaretroviruses to be included in the Env tree. 



Hayward et at. Retrovirology 2013, 10:35 
http://www.retrovirology.eom/content/1 0/1/35 



Page 7 of 19 



(Figure 2C). Finally, we also found some evidence for 
within-genome recombination (e.g. M1ERV-(3C, D and E) 
as reflected in the phylogenetic incongruence between 
the Gag and Pol and Env trees. 

Phylogenetic analysis of betaretrovirus and 
gammaretrovirus Env 

Our reciprocal tBLASTx searches indicated that the 
P. alecto ERV (PaERV-|3A) and two of the P. vampyrus 
ERVs (PvERV-pj and K) encoded Env sequences that 
were more similar to gammaretroviral Env, while still 
possessing Gag and Pol sequences that closely resembled 
those of known betaretroviruses (see above). To confirm 



this observation we undertook a phylogenetic analysis of 
the Env sequences of known gammaretroviruses and 
betaretroviruses, together with the newly identified 
(3ERV Env sequences (Figure 3). This analysis confirmed 
previous observations [12,38] that the Env sequences of 
some extant Type D betaretroviruses, namely MPMV, 
SMR and simian retrovirus serotypes 1 and 4 (SRV1 and 
SRV4), cluster with gammaretroviral Env, as do those of 
PvERV-PJ, K, PaERV-|3A, PaEnv-01 (Env sequence de- 
rived from P. alecto), and RfEnv-01 (Env sequence 
derived from R. ferrumequinum). Other Type D retrovi- 
ruses such as JSRV and the enzoonotic nasal tumor vi- 
ruses (ENTV) of sheep and goats did not fall into this 
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Figure 3 Phylogenetic comparison of the envelope (Env) protein sequence of betaretroviruses and gammaretroviruses. Bootstrap 
values <70% are not shown, and branch lengths are drawn to a scale of amino acid substitutions per site. Bootstrap values are denoted 
as ** >90%; * >70% and <90%. PERV proteins of P. vampyrus and P. alecto are highlighted in red text. PERVs of M. lucifugus and ft. ferrumequinum 
are highlighted in blue text. Gammaretroviruses are highlighted in teal text. 
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cluster. This indicates that a recombination event has 
occurred, in which a sub-lineage of Type D betare- 
troviruses acquired a gammaretroviral env gene. 

Analysis of bat PERV sub-genus clades 

Our analysis of the full-length bat |3ERVs revealed an un- 
expected diversity of genomic organizations, as a num- 
ber were found to contain unique ORFs. Some of these 
ORFs were in alternative reading frames within the core 
element domains and others were either upstream of 
gag, or downstream of env. Furthermore, the differential 
use of tRNA Lys 1,2 and tRNA Lys 3 was not found to 
be restricted to either Type B or Type D betare- 
troviruses. Rather, it appears that a switch between the 
two has occurred multiple times throughout the history 
of the genus. This diversity of genomic organization was 
used in conjunction with the phylogenetic analyses of 
Gag, Pol, and Env (with prime consideration given to 
the highly conserved Pol phylogeny) and the tRNA usage 
to identify eight distinct groups within the Betaretrovirus 
genus (Figure 4). The eight betaretroviral subgroups that 
we propose are distinguished from each other by major 
evolutionary differences such as deep phylogenetic diver- 
gence with strong bootstrap support (>90% of trees re- 
solving the clade), significant mutations in key genetic 
features such as a switch to the use of a different PBS, 
or the presence of retroviral genes from a different 
genus. 

Group I (represented by HERV-K113) consists of the 
HERV-K group of endogenous betaretroviruses which 
contain a PBS similar to tRNA Lys 1,2 and have a deep 
phylogenetic divergence from other betaretroviruses. No 
known exogenous betaretroviruses or bat |3ERVs cur- 
rently reside in Group I. 

Group II (represented by MIERV-pA) consists of 
a phylogenetic cluster of endogenous bat |3ERVs that 
branched off from Group I early in betaretroviral history. 
Three bat PERVs are included in this group. The PBS of 
M1ERV-|3A and M1ERV-|3B are complementary to tRNA 
Lys 1,2 and Lys 3, respectively, while the tRNA usage of 
PvERV-(3A is unknown as a 100 nucleotide NSR overlaps 
its PBS. M1ERV-|3A contains a large 1,493 nucleotide in- 
sertion within its 5' LTR that contains a 323 codon 
ORF. This insertion presumably arose post-integration 
and the nature of this genetic element is unknown. A 
Pfam domain search and BLASTp analysis of the trans- 
lation of the ORF against the NCBI non-redundant pro- 
tein sequence database did not identify any known 
protein domains or similarity to any known protein. 

Group III (Represented by M1ERV-|3C) consists of 
microbat ERVs that possess a phylogenetically divergent 
Pol (bootstrap support >90%) and a PBS complementary 
to tRNA Lys 3. Within this group is M1ERV-|3C, the first 
fully intact bat [3ERV to be identified, and which raises the 



possibility that exogenous members of Group III may yet 
exist as undiscovered infectious betaretroviruses. 

Group IV (Represented by M1ERV-|3E) appears to have 
diverged as a part of the Type B betaretroviral lineage. 
However, the precise phylogenetic position of Group 
IV's sole member, M1ERV-(3E, is not supported by high 
bootstrap support in any of the trees. Furthermore the 
precise identity of its PBS is uncertain. The PBS does 
not appear to be specifically complementary to either 
Lys 1,2 or Lys 3 tRNA, but rather it appears to be com- 
plementary to an alternative mammalian lysine tRNA. 
There are presently no known extra copies of M1ERV-|3E 
within the M. lucifugus genome. 

MIERV-pE is distinguished by its possession of a 
unique ORF upstream of Gag. This ORF begins within 
the 5 ' LTR and terminates three nucleotides upstream 
of the Gag start methionine, within the same reading 
frame. ORFs upstream of Gag may be relevant to Gag 
expression considering that murine gammaretroviruses 
encode an alternative N-terminally extended version of 
Gag, glyco-gag, that has a role in the promotion of viral 
replication [39,40]. No promoter elements or TATA 
boxes were predicted to exist upstream of the ORF, how- 
ever a TATA box is predicted within the ORF coupled 
with a possible start methionine downstream, encoding 
a potential 84 amino acid protein. 

Group V (Represented by PvERV-|3B) consists of arche- 
typically structured Type B betaretroviruses (MMTV-like) 
that contain long LTRs (~1,200 bases). It is possible that 
the extension of the 3 ' LTR has facilitated the emergence 
of ORFs in this location as in the case of MMTV's sag 
gene. In this regard, PvERV-pB has an ORF within its 3' 
LTR. This ORF is 123 codons in length, much shorter than 
MMTV's Sag protein, which is 320 amino acids long. 
While it is possible that the ORF was longer at integration 
and has simply been interrupted by stop codon mutations 
since that time, a tBLASTn analysis of MMTV's Sag pro- 
tein against the 3' LTR of PvERV-pB did not reveal any 
significant sequence similarity. Also in this group is 
EqERV, an endogenous horse betaretrovirus, which does 
not contain a sag gene or sag-like ORF within its 3' 
LTR [27]. 

Group VI (Represented by PvERV-pD) consists of 
JSRV-like Type D betaretroviruses that contain short LTRs 
(-300 bases) and Env protein sequences that do not 
phylogenetically cluster with those of the gammaretrovirus 
genus. Members of this group harbour a PBS complemen- 
tary to tRNA Lys 1,2 and may or may not contain 
additional ORFs within their core element domains, as is 
the case for JSRV and ENTV's ORF-x located within pol, 
and PvERV-pD, which has an ORF overlapping the 3' end 
of the env gene. 

Group VII (Represented by PvERV-|3F) consists wholly 
of bat pERVs. Group VII members are phylogenetically 
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Figure 4 Eight sub-groups of the betaretrovirus genus. A schematic diagram for a single representative of each group is depicted. Core 
retroviral genes gag, pro, pol, and env are bordered by the proviral long terminal repeats (LTRs). Also shown are other major genetic features such 
as open reading frames (ORFs) greater than 300nt in length and the rec gene of HERV-K1 13, enzymatic active site motifs of protease (DxG), 
reverse transcriptase (DDD), and integrase (DDE); the primer binding site (PBS) and polypurine tracts (PPT); and the characteristic betaretroviral 
dUTPase domain. NSR: non-sequenced region. ORF* is part of foreign nucleotide insertion within MIERV-(5A and does not appear to be a 
retroviral element. 



Hayward et at. Retrovirology 2013, 10:35 
http://www.retrovirology.eom/content/1 0/1/35 



Page 10 of 19 



Type D-like and are primarily distinguished by a PBS 
complementary to tRNA Lys 3 as opposed to tRNA Lys 
1,2 which is the expected PBS complementarity for Type 
D betaretroviruses. Also, several bat pERVs in this group 
possess a unique ORF upstream of Gag that is distinct 
from that of group IV's M1ERV-|3E. This ORF begins 
within the 5 ' LTR and terminates 26 nucleotides upstream 
of the gag start codon. Promoter elements and TATA 
boxes are predicted to exist upstream of this ORF. As 
there were differences in the start position of this ORF in 
the various group VII bat |3ERVs (PvERV-|3E - I), likely 
due to random mutation since integration, a nucleotide 
alignment of the region was generated (Additional file 1: 
Figure S2). The alignment demonstrated that the consen- 
sus ORF contained a possible start methionine that would 
code for a 101 amino acid protein. One member of this 
group, PvERV-[3E, is almost fully intact as it does not ap- 
pear to contain any frameshift mutations and only a single 
premature stop codon within the pro gene. 

Group VIII (Represented by PvERV-|3J) consists of 
MPMV-like Type D betaretroviruses. The distinguish- 
ing feature of this group is the possession of an 
encoded Env polyprotein that phylogenetically clusters 
with those of gammaretroviruses rather than those of 
other betaretroviruses. The bat pERVs in this group 
have an additional feature which is an ORF beginning 
40 bases downstream of the env stop codon and ter- 
minating 15 bases into the 3' LTR. This is exemplified 
in PvERV-PJ. A nucleotide sequence alignment of the 
extreme 3' region (Additional file 1: Figure S3) of the 
closely related PvERV-|3J, K, and PaERV-(3A generated 
a consensus sequence that contained this ORF and re- 
vealed that the equivalent ORF sequences in PvERV-(3K 
and PaERV-|3A are respectively interrupted by a 
frameshifting deletion mutation and stop mutation. 
This ORF contains a possible start methionine that 
would generate a 90 amino acid protein. This align- 
ment also indicated that the alternative ORF* in 
PaERV-|3A (Figure 1) was likely to be an artifact as the 
U3 region contained an eight nucleotide insertion that 
disrupts a stop codon which, if the insertion did occur 
after integration, has generated an artificial ORF. The 
PaERV-|3A genome was derived from Illumina based 
transcriptome sequencing while the PvERV-|3J and 
PvERV-|3K genomes were derived through whole- 
genome shotgun/Sanger sequencing. Accordingly, each 
method can be used to orthogonally verify the other. A 
full alignment of the three proviruses (Additional file 3: 
Figure S4; demonstrating 96.66% nucleotide identity 
between PvERV-|3J and PvERV-|3K and 93.77% between 
PvERV-|3K and PaERV-(3A) supports the veracity of 
these proviral sequences and provides further evidence 
that the group VIII |3ERVs are likely derived from a sin- 
gle integration event. 



The unique ORFs identified in the bat (3ERVs of all 
groups were subjected to a BLASTp analysis against the 
NCBI non-redundant protein database and Pfam domain 
search. However, no BLAST hits or known protein do- 
mains were identified. 

Analysis of elements involved in nuclear export of intron- 
containing bat betaretroviral RNA 

To determine if the groupings we had assigned were 
congruent with known functional differences between 
retroviruses with respect to betaretroviral RNA nuclear 
export strategies, we analyzed the bat |3ERVs, along- 
side known exogenous and endogenous betaretroviruses, 
for evidence of motifs indicative of the major export 
strategies (Additional file 2: Table S4). To this end we 
employed a computational analysis to search for the 
presence of nuclear localization signals (NLS) and nu- 
clear export signals (NES) common to the retroviral 
Rev-like proteins used in the archetypal Rev/Rev-respon- 
sive element (RRE) equivalent export mechanism. We 
also searched for the presence of Tap-binding elements 
(TBE) within and downstream of the env gene, which 
would imply the utilization of the CTE export pathway, 
and for direct nucleotide repeats (DR) and inverted nu- 
cleotide repeats (IR) that might suggest the formation of 
stem-hairpin-loop structures known to be associated 
with the CTE [23]. 

While a number of |3ERVs were predicted to contain 
either an NLS or an NES, only M1ERV-(3B and PvERV- 
|3B were found to contain both. These |3ERVs broadly 
cluster with HERV-K and MMTV, which respectively 
encode the Rev-like proteins Rec and Rem, and the pres- 
ence of both NLS and NES points to the possibility that 
they encode Rev-like proteins and make use of the Crml 
nuclear RNA export pathway. The majority of the |3ERVs 
in group VII were found to contain TBE, indicating that 
the original exogenous forms of these retroviruses likely 
utilized the nuclear export pathway accessed by the 
CTE. 

Molecular clock analysis of LTRs 

We used an analysis of the LTRs to estimate the time 
since integration of the bat (3ERVs. This analysis evalu- 
ated the extent of the difference between the nucleotide 
sequences of the 5' and 3' LTRs of each |3ERV, which 
are expected to be identical at the time of integration. 
The number of nucleotide differences between the 5' 
and 3' LTR is assumed to be proportional to the time 
since integration, although this may be compromised by 
such factors as gene conversion [41]. Under this as- 
sumption, all [3ERVs integrated into the genomes of the 
ancestors of modern bats within a wide time range of 
between 3.2 and 36.3 million years ago (mya), and hence 
long after the divergence of bats from other mammalian 
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lineages (Table 3). This, in turn, suggests that (i) that the 
original exogenous forms of these (3ERVs targeted an- 
cient bats, and (ii) there has been a continual integration 
of betaretroviruses into bat genomes during their evolu- 
tionary history. 

Betaretroviral evolution and diversification 

We coupled our analysis of the genomic features of the 
bat |3ERVs with the phylogenetic patterns observed in 
the Gag, Pol, and Env trees (with primacy given to 
the phylogeny of the highly conserved polymerase se- 
quences) to generate a hypothetical series of events that 
may have led to the current state of diversity in the 
genus Betaretrovirus (Figure 5). 

Our analysis indicates that while the ancient progeni- 
tor betaretrovirus likely made use of a tRNA Lys PBS, its 
specific identity is uncertain. Groups I and II appear to 
have branched off together early in betaretroviral history. 
This has led, in the case of the HERV-K betaretroviruses, 
to the emergence of distinct genetic elements such as 
the NP9 and Rec proteins, whose current endogenized 
forms have possible roles in tumorgenesis [42,43]. Group 
Ill's phylogenetic position places its point of divergence 

Table 3 Estimation of time since integration 



Divergence 3 Age (mya) b 



P. vampyrus 






PvERV-pA 


0.024 


30 


PvERV-pB 


0.011 


13.8 


PvERV-pC 


0.027 


33.8 


PvERV-pD 


0.056 


ND 


PvERV-pE 


0.011 


13.8 


PvERV-pF 


0.043 


ND 


PvERV-pG 


0.039 


ND 


PvERV-PH 


0.006 


7.5 


PvERV-pi 


0.029 


36.3 


PvERV-pj 


0.005 


6.3 


PvERV-pK 


0.025 


ND 


M, lucifugus 






MIERV-PA 


0.055 


29 


MIERV-PB 


0.043 


22.6 


MIERV-PC 


0.008 


4.2 


MIERV-PD 


0.012 


6.3 


MIERV-PE 


0.007 


3.7 


MIERV-PF 


0.006 


3.2 



a 5' and 3' LTR divergence: number of differences, per nucleotide, per site. 
b Molecular clock dating was used to estimate the time in millions of years 
(mya) since the integration of each betaretrovirus into the host genome, 
based on the number of nucleotide differences between the 5' and 3' LTRs of 
each betaretrovirus [25]. 

ND: Not dated; these (3ERVs could not be dated using this method. PvERV-(3D 
and PvERV-^F contained non-sequenced regions within their 5' LTR, while 
PvERV-pG and PvERV-(3K contained bulk deletions within their 3' LTRs. 



after the split of Groups I and II but prior to the split 
between the Type B and Type D lineages. 

The divergence between Type D and Type B |3ERVs 
seems to have occurred as a result of their differential 
use of tRNA Lys 1,2 and tRNA Lys 3, respectively. 
Within the Type B lineage are groups IV and V which, 
although possibly splitting after the divergence of Type 
B and Type D, differ in the length of their LTRs, their 
tRNA usage, and their additional genetic elements. 
Within the Type D lineage an early event appears to 
have been a recombination between a betaretrovirus and 
a gammaretrovirus, which has caused a divergence be- 
tween JSRV-like and MPMV-like Type D betaretro- 
viruses. In this split, group VIII appears to have diverged 
from groups VI and VII through the acquisition of a 
gammaretroviral env gene. Group VII later diverged 
from group VI by a switch from the use of tRNA Lys 1,2 
to tRNA Lys 3 and differentiation of their additional 
ORFs. 

Discussion 

We searched for the expression of betaretroviral genes 
in the transcriptomes of the megabat P. alecto and 
the microbats R. megaphyllus and R. ferrumequinum. 
Through this analysis we determined that betaretroviral 
genes were being transcribed into mRNA within each 
species and we identified that a full-length genomic 
transcript of a betaretrovirus (PaERV-(3A) was being 
expressed in P. alecto. As each of the genes of PaERV- 
|3A were found to contain mutations that likely rendered 
them non-functional, it seems reasonable to conclude 
that the transcript was expressed from a defective pERV 
rather than a functional exogenous betaretrovirus. It is 
important to note that we cannot exclude the possibility 
that the reported PaERV-(3A transcript was derived from 
multiple similar sequences during transcriptome assem- 
bly and due to recombination between similar tran- 
scripts during cDNA synthesis or PCR as published [44]. 

Our analysis of the genomes of the megabat P. vampyrus 
and the microbat M. lucifugus revealed that they contain a 
genetically diverse range of full-length (3ERVs. In the case 
of M. lucifugus this included an intact [3ERV (M1ERV-|3C) 
that did not contain any mutations that would clearly ren- 
der the gene products non-functional. However, it should 
be noted that as revealed by the LTR analysis, nucleotide 
substitutions have occurred in the M1ERV-[3C sequence. 
While the critical enzymatic active site motifs are intact, 
whether or not the nucleotide substitutions that have oc- 
curred in the coding domains would have a detrimental ef- 
fect on the functionality of the gene products is not 
known. 

In analyzing the genetic content of the full-length 
[BERVs for the presence of ORFs, aside from those cod- 
ing for the core genes, we set a minimum cut-off of 100 
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Figure 5 A proposed series of events leading to the current diversity in the genus Betaretrovirus. The proposed series of evolutionary 
events leading to eight distinct sub-groups of betaretroviruses based on a combination of the phylogenetic analyses of Gag, Pol, and Env protein 
sequences, and the genomic features and organizations of individual betaretroviruses. 



codons to limit the amount of incidental non-coding 
ORFs that would be identified. However, many retroviral 
accessory and regulatory genes, such as rec and np9 of 
HERV-K and vpr and tat of HIV-1, are shorter than 100 
codons and are often encoded over the span of two 
exons. Despite the high minimum cut-off, it is striking 
that the bat pERVs possessed a diverse array of add- 
itional ORFs. While we cannot confirm that these are in- 
deed protein coding domains, much less speculate on 
their function, the existence of similar elements is not 
without precedent among the betaretroviruses. One ex- 
ample is the 'ORF x' of JSRV, the function of which is 
unknown but it has been found to be broadly conserved 
amongst JSRV isolates [45]. Several of the ORFs we 
identified overlap the proviral LTRs, which consist of 
typically untranslated regions. This is also not unprece- 
dented, with a prime example being the sag gene of the 
betaretrovirus MMTV, which is situated entirely within 
the U3 region of the 3' LTR. The presence of unique 
ORFs in pERVs may indicate the evolution of novel 
retroviral genes whose products have regulatory or 
accessory functions required for the retroviral life-cycle 
and/or pathogenesis. In addition to the pERVs reported 
in this study we noted the presence in both mega- and 
microbats of betaretrovirus-like retroelements that re- 
semble |3ERVs but lack the env gene; these were not in- 
vestigated further (data not shown). 

We reported each [3ERV as a distinct entity. Neverthe- 
less it is reasonable that some of their number, particularly 
the pERVs within each of groups VII and VIII, represent a 



common progenitor infectious betaretrovirus that has 
undergone duplication events via retrotransposition or re- 
combination since an original, single integration event. 
For example, the integration time of PvERV-pj coupled 
with its similarity to PaERV-[3A and PvERV-pT< may mean 
that these pERVs originated from a single integration into 
the genome of the common ancestor of P. vampyrus and 
P. alecto and that at least a single duplication event has 
occurred within P. vampyrus (or the common ancestor). 
However, it is also arguable that multiple integrations of 
closely related infectious retroviruses separated from each 
other by perhaps a small number of infectivity cycles oc- 
curred. We attempted to address this question by a com- 
parative analysis of the flanking genomic DNA located 
immediately up- and downstream of the proviruses and by 
identifying the TSD that border each provirus and arise as 
a by-product of the integration mechanism [46]. Unique 
TSD indicate distinct integration events. In the case of the 
group VIII [3ERVs a TSD for PvERV-|3K could not be iden- 
tified as its 3 ' LTR appears to be truncated. This may indi- 
cate that it is a copy of PvERV-pJ that has arisen through a 
chromosomal duplication event. This appears to be con- 
firmed by the identification of genomic DNA bordering 
PvERV-|3J that is homologous to genomic DNA flanking 
PvERV-(3K. As PaERV-[3A is a genomic transcript it does 
not contain TSD. In the case of group VII [3ERVs all of the 
identifiable TSD differ from one another, indicating separ- 
ate integration events. Additionally, no flanking genomic 
DNA homology was identified amongst the members of 
the group. 
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Notably, both the phylogenetic and LTR analyses re- 
vealed a great diversity of (3ERVs in bat genomes. Our 
molecular clock dating suggested that the earliest viral 
incorporation event occurred at approximately 36 mya 
which is older than the separation of the megabats and 
microbats studied (around 20 mya) [28]. In addition, it is 
clear that some of the |3ERVs present in bat genomes 
were vertically transmitted from their ancestors; e.g. 
M1ERV-|3A and PvERV-(3A are grouped together and are 
of similar age having been integrated approximately 30 
mya. However, it is also the case that many of the bat 
|3ERVs formed via independent viral invasion and in- 
corporation as they have different phylogenetic positions 
as well as different estimated ages of integration. 

In addition to their genomic diversity, we observed that 
a number of phylogenetic clusters within the genus dif- 
fered in their more fundamental aspects. Specifically, the 
use of tRNA Lys 1,2 or tRNA Lys 3 was not restricted to 
the divide between Type B and Type D betaretroviruses, 
and a clade that was distinct in both Gag and Pol trees 
possessed a gammaretroviral env gene. This prompted us 
to define eight sub-groups (Group I-VIII) within the genus 
that accounted for these fundamental differences in the 
context of phylogenetic divergences at the amino acid 
level of the core polyproteins. Our LTR analysis also re- 
vealed that bats have been infected with betaretroviruses 
for most of their evolutionary history. This supports the 
notion that bats are a potential reservoir for infectious 
betaretroviruses. 

A previous study reported a short, partial retroviral se- 
quence (CpERV-|35, AC138156) in the genome of the 
microbat Carollia perspicillata (Seba's short-tailed bat) 
[25]. However, this sequence contained large deletions, 
was missing the entire pro and pol genes, and only frag- 
ments of the gag and env genes remained. The partial Env 
of CpERV-|35 most closely matched the Env of the 
betaretrovirus SMR and on that basis it was reported as a 
betaretroviral sequence. In this study, we report a series 
of complete [3ERVs in mega- and microbat genomes 
representing the breadth of the genus Betaretrovirus. 
Although CpERV-p5 does contain a lysine tRNA-specific 
PBS, without a pol gene to phylogenetically differentiate it 
or the presence of the characteristically betaretroviral 
dUTPase domain within pro, it cannot be known with cer- 
tainty whether it is a group VIII betaretrovirus or a 
gammaretrovirus. The study by Bailie et al. [25] and a re- 
cent study by Anai et al. [47] both noted the similarity be- 
tween the Env of Type C gammaretroviruses and some 
Type D betaretroviruses which was attributed to a likely 
recombination event. We have shown that the betare- 
troviruses, which possess a gammaretrovirus -like Env, 
form a single clade in both Gag and Pol phylogenies. This 
indicates that a single recombination event produced 
these group VIII betaretroviruses. Furthermore, the typical 



mammalian gammaretroviral use of tRNA proline and 
glycine-specific PBS and the absence of dUTPase domains 
from their pro genes [14] can be used to infer that the na- 
ture of the recombination event was the insertion of a 
Type C gammaretroviral env gene into a Type D betare- 
trovirus. Previous studies also determined a recombi- 
natorial origin for the Type D env [12,38]. However, this 
conclusion was reached prior to the sequencing of 
the genome of JSRV [48], which does not possess a 
gammaretrovirus-like Env, and its subsequent classifica- 
tion as a Type D retrovirus. As such, it was hypothesized 
that it was this recombination event that gave rise to the 
Type D lineage of betaretroviruses [12,38]. Our analysis 
aimed to provide a clarification of the differences between, 
within, and outside of the Type B and D groups of 
betaretroviruses. Accordingly, we suggest that the funda- 
mental feature giving rise to the division between the Type 
B and D lineages may have been the use of different pri- 
mer binding sites, not the possession or not of a Type C 
env gene, which appears to be a more recent and more 
significant lineage divergence within the Type D group. 

Bailie et al. [25] described seven groups within the 
genus Betaretrovirus. These groupings were made solely 
on the basis of pol gene nucleotide sequence similarity. 
While manually determining amino acid sequences from 
genes that contain frameshift mutations is difficult, 
when the manual reconstruction is closely informed by 
the alignment of each translated frame against known 
betaretroviral polymerases, amino acid sequence recon- 
struction is a viable option. As such, our phylogenetic 
analyses differ from those undertaken previously in that 
they are based on amino acid sequence alignments, and 
our groupings are based on differences in the fundamen- 
tal genomic features in addition to phylogenetic cluster- 
ing. Tristem [49] reported on the identification and 
classification of the highly diverse endogenous retrovi- 
ruses present in the human genome (HERVs) and 
suggested that tRNA PBS specificity, in addition to 
the polymerase phylogeny of endogenous retroviruses, 
should inform their classification. This is because even if 
the ERVs of a given species cluster together in phyloge- 
nies, the use of different tRNA PBS may be evidence of 
separate origins. Indeed, that study made the assumption 
that HERVs with alternative PBS homologies were de- 
rived from cross-species transmissions. With this in 
mind, we analyzed the PBS sequences of the identified 
[3ERVs and used this information to aid and inform the 
delineation of our grouping scheme. 

Mammalian cells restrict the export of intron contain- 
ing mRNA from the nucleus to the cytoplasm, and 
betaretroviruses have been found to utilize two different 
mechanisms to circumvent this restriction and export 
unspliced genomic RNA and singly-spliced env mRNA. 
The Type B betaretrovirus MMTV, and the HERV-K 
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endogenous retroviruses are known to use Rem and Rec, 
respectively, which are HIV Rev-like export proteins, that 
possess equivalent mechanisms of action [17,50-52]. The 
Type D betaretroviruses MPMV and SRV make use of 
the ris-acting CTE, which in the absence of a retroviral 
accessory protein, recruits cellular proteins to effect nu- 
clear export of intron containing viral RNA [22,23]. This 
apparent dichotomy has been complicated by recent lines 
of investigation that have found that i) MMTV likely pos- 
sesses a second, Rem-independent mechanism for the ex- 
port of singly-splice env mRNA [52]; and ii) the Type D 
betaretrovirus JSRV contains both a CTE and a Rev-like 
protein, Rej, which while found to possess a primary func- 
tion related to Gag synthesis, also enhances RNA export 
in some cell types [21,53]. This indicates that betare- 
troviruses may make use of multiple export mechanisms, 
possibly providing some measure of redundancy to pro- 
mote productive replication in different contexts. 

We conducted a computational analysis to predict the 
presence of RNA export motifs that would indicate 
which mechanism was utilized by each pERV. We found 
that bat pERVs, clustering with betaretroviruses known 
to utilize the Crml export pathway, typically contained 
one or both of the NLS and NES motifs, suggesting that 
they too encode a Rev-like protein. It was not surprising 
that some (BERVs were predicted to contain one motif 
but not the other, as random mutation since integration 
is expected to interfere with sequence-based motif 
prediction. It is also possible that the NES of some 
betaretroviral Rev-like proteins (such as is the case for 
HERV-K Rec) are encoded at the exon boundary and/or 
within a frame different to that used by env, making the 
prediction of NES from the Env protein sequence chal- 
lenging. A number of PERVs in group VII were found to 
contain retroviral Tap-binding motifs, defined as pub- 
lished [23], implicating their use of the CTE:Tap export 
pathway. The presence of putative NLS and NES in 
some group VII PERVs suggests that Rev-like elements 
may also be present. 

As Rev-like proteins are encoded within the env gene, 
the recombination event that replaced the betaretroviral 
env with a gammaretroviral env and gave rise to group 
VIII would have caused the incidental loss of any 
encoded Rev-like protein. Such a lineage would only 
have remained viable if it either possessed an alternative 
mechanism for export, or never made use of a Rev/RRE 
equivalent export mechanism in the first place. That 
Rev-like proteins are widely distributed amongst the 
betaretroviruses suggests that it is not unreasonable that 
the progenitor of group VIII did possess a Rev-like pro- 
tein. This possibility is supported by the existence of the 
Rej protein of JSRV, as JSRV clusters alongside group 
VIII in the Type D lineage. In addition, several bat 
|3ERVs in groups VI and VII contain putative NLS and 



NES motifs, suggesting that members of these groups 
contain Rev-like elements. 

If group VIII did lose a Rev-like protein upon acquisi- 
tion of a gammaretroviral env, then two explanations for 
the lineage's survival are apparent: i) The recombination 
event was confined to env and the betaretroviral CTE 
possessed by MPMV and SRV, which is located immedi- 
ately downstream of env, already existed as a redundant 
export mechanism and remained after the event, or ii) 
The recombination event included the nucleotide se- 
quence downstream of the env gene, and a putative 
CTE-like element was acquired in the process. With re- 
gard to the second possibility it is important to note that 
the mRNA nuclear export mechanism of gammare- 
troviruses has not been elucidated and the proposal of a 
CTE-like element remains hypothetical. However, this 
notion is supported by the observation that accessory 
proteins have not been reported for gammaretroviruses, 
expression of unspliced and singly-spliced viral mRNA 
would require nuclear export, and that a CTE-like CIS- 
acting nuclear export element would necessarily be lo- 
cated in singly-spliced env mRNA. In either event, our 
analysis leads to the surprising implication that the 
betaretroviruses are part of a fundamentally complex 
retroviral genus and that one lineage, group VIII, has 
evolved through gene replacement into a simple retro- 
virus sub-group that does not possess any distinct 
accessory proteins or virulence factors. 

Using the phylogenetic analysis of retroviral Pol se- 
quences we proposed a pathway through which the 
genus Betaretrovirus may have evolved from its progeni- 
tor. This hypothetical evolutionary history paints an in- 
teresting picture of a broad and diverse retroviral genus 
whose distribution may be largely unrestricted by host 
species barriers. The |3ERV members of a number of 
groups are represented in hosts who are distantly re- 
lated, such as group VIII, which contains host species 
from bats, primates, rodents, and marsupials. This sug- 
gests that cross-species transmission of betaretroviruses 
is a likely and common occurrence, such that betare- 
troviruses may be particularly adept at evading host de- 
fences. This possibility is intriguing, particularly in light 
of the wide array of additional ORFs found within the 
genus that hint at the existence of as yet undiscovered 
betaretroviral accessory and virulence factors; these 
could, for example, act as countermeasures to circum- 
vent the action of host intracellular restriction factors 
that are known to act as barriers to cross-species trans- 
mission [54]. The wide distribution of diverse pERVs in 
bats and rodents suggests that these two largest groups 
of mammals play a major role as both hosts and 
cross-species transmitters for betaretroviruses. Bats and 
rodents are globally distributed, appearing on all conti- 
nents with the exception of Antarctica [30,55]. As such 
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it appears reasonable to postulate that they have both 
played a large role in the global spread and evolution of 
betaretroviruses. 

Conclusions 

We have demonstrated the presence of a range of [BERVs 
in mega- and microbats that possess a diversity that can- 
not be confined to the classical Type B/Type D division. 
Among their number we identified an intact (3ERV that 
may be capable of producing infectious virions, and our 
LTR analysis indicates that betaretroviruses have been 
circulating in bat populations throughout their evolution 
and likely still do. 

Our evidence that bats have carried a range of exogen- 
ous infectious betaretroviruses and that cross-species 
transmission has been commonplace has important im- 
plications for disease emergence. Indeed, the reported 
association between the betaretrovirus MMTV and hu- 
man breast cancer and primary biliary cirrhosis may 
mean that betaretroviral zoonosis is already causing dis- 
ease in humans [56-58]. Urban expansion into the nat- 
ural habitats of bats is gradually increasing the amount 
of overlap between bat and human environments, and 
with it the amount of contact between bats and humans 
[59] . In many countries the practice of hunting bats as a 
source of consumable bushmeat is common [60]. These 
circumstances provide the opportunity for retroviral 
transmission between bats and humans. We propose 
that the transmission of a betaretroviral infection from 
bats into humans is possible. As such, it is imperative to 
continue to survey those viruses present in bats. 

Methods 

Generation of bat transcriptomes 

Approval for the use of bat tissue was granted by the 
Australian Animal Health Laboratories Animal Ethics 
committee (Protocol AEC1281) and by the Animal Ethics 
Committee of East China Normal University (Approval 
Number 20110224). P. alecto transcriptome datasets were 
generated from the non- stimulated thymus tissue of a 
healthy male juvenile bat and the pooled total RNA 
obtained from mitogen-stimulated spleen, white blood 
cells, and lymph node and the unstimulated thymus and 
bone marrow obtained from one pregnant female and one 
adult male as described previously [61]. The P. alecto tran- 
scriptome is accessible through the NCBI Sequence Read 
Archive (http://www.ncbi.nlm.nih.gov/Traces/sra/) [SRA: 
SRP008674]. The R. ferrumequinum transcriptome was 
generated using whole brain tissue as published [37]. The 
P. alecto and R. ferrumequinum transcriptomes were se- 
quenced using the Illumina next-generation sequencing 
(NGS) platform as described previously [37,61]. The P. 
alecto transcriptome was assembled using Velvet, Oases, 
and MIRA software packages as described previously [61]. 



The R. ferrumequinum transcriptome was assembled 
using the Brujin graph and SOAPdenovo software pack- 
ages as described previously [37] . 

The generation of the R. megaphyllus transcriptome 
was conducted as follows: Four wild bats, (one female 
and 3 male) were caught in the Booloumba Creek caves 
in Queensland, Australia in November 2006 and tissues 
from brain, kidney, large and small intestines, liver, lung, 
spleen, heart, skin, bone and reproductive organs were 
pooled and stored in RNAlater (Ambion). Total RNA 
was isolated from the 12 pooled bat tissues using the 
Qiagen RNeasy kit. DNA was prepared from purified 
total RNA (2.5 ug per cDNA reaction) using the Evrogen 
MINT cDNA synthesis kit (CAT # SK001) but with a 
modified oligodT adapter primer containing the recogni- 
tion sequence for Gsul (5' AGCAGTGGTATCAACG 
CAGAGT CTGGAG(T) 20 VN). The cDNA was normal- 
ized with a duplex specific nuclease (DSN) using a 
modification of the protocol described in the Evrogen 
Trimmer cDNA normalization kit (Cat # NK001). After 
the second limited PCR amplification (12 cycles) with 
the M2 primer, PCR buffer, primers and enzyme were 
removed using the Machery Nagel Nucleospin II kit. 
DNA was then digested overnight with Gsul to remove 
the 3' polyA tail adapter sequence so as to remove 
stretches of homopolymer Ts and As which can effect 
the 454 sequencing run due to cross-talk (homopolymer 
flash). Five micrograms of normalized amplified double 
stranded cDNA was purified using the Machery Nagel 
Nucleopsin kit with the selective removal of the Gsul 
digested 43 base pair (bp) 3 poly A adapter sequence 
using a modification of the binding conditions. Library 
preparation for Roche 454 sequencing for the GS FLX 
platform was performed by the Australian Genome 
Research Facility Ltd, St Lucia, Queensland with se- 
quence output of 74 MB, 374,360 single-end reads with 
an average read length of 239 bp. CLC Genomics Work- 
bench version 4.5.1 (CLC Bio, Aarhus, Denmark) was 
used to trim reads based on quality and to remove the 
Evrogen normalization primer sequence, Subsequent 
337,805 reads were de novo assembled using CLC 
Genomics Workbench default settings and BLAST data- 
bases were prepared using either de novo assembled or 
trimmed unassembled reads. 

Analysis of bat transcriptomes 

To search for evidence of betaretroviral gene expres- 
sion within the bat transcriptomes we retrieved 
the genome sequences of extant betaretroviruses 
from GenBank (http://www.ncbi.nlm.nih.gov/genbank/), 
specifically: Mouse mammary tumor virus (MMTV) 
[GenBank: NC_001503.1], Mason-Pfizer monkey virus 
(MPMV) [GenBank: NC_001550.1], Jaagsiekte sheep 
retrovirus (JSRV) [GenBank: NC_001494.1], Simian 
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retrovirus (SRV) [GenBank: NC_014474.1], and Squirrel 
monkey retrovirus (SMR) [GenBank: NC_001514.1]. 
The gag, pol, and env genes of each genome sequence 
were translated into protein sequences using the CLC 
Main Workbench 6.6 (CLC Bio). To identify the tran- 
scripts of interest we used the tBLASTn function of the 
CLC Main Workbench incorporating the following pa- 
rameters: BLOSUM62 matrix, word size = 3, E-values < 
lxlCT 10 , gap costs of existence 11, extension 1, and low 
complexity filtered. To confirm that the transcripts 
identified were more similar to betaretroviruses than 
other retroviral genera we performed a reciprocal 
BLAST analysis of each transcript against the NCBI 
non-redundant protein database (http://blast.ncbi.nlm. 
nih.gov/Blast.cgi) using the BLASTx function of the 
CLC Main Workbench with the following parameters: 
BLOSUM80 matrix, word size = 3, E-values < 1 x 10" 10 , 
gap costs of existence 10, extension 1, low complexity 
filtered, and limit by entrez query = Viruses. Annotated 
sequences of the full-length betaretroviral sequences in- 
cluded in the phylogenetic analyses (PaERV-pA, PaPol- 
01, PaEnv-01, and RfEnv-01) are included as Additional 
file 4. 

Assembly of PaERV-pA 

We generated the genomic sequence of PaERV-|3A using 
two transcripts identified in the P. alecto transcriptome 
during the initial BLAST analysis which were aligned 
using the CLC Main Workbench and trimmed by 245 
and 401 nucleotides at the 5' and 3' extremities of their 
overlapping region, respectively. 

Genomic mining 

To determine the presence of full-length pERVs in 
mega- and microbats we retrieved the genomes of m 
P. vampyrus and M. lucifugus from the Ensembl data- 
base (http://www.ensembl.org/index.html). We searched 
for genomic sequences with similarity to the aforemen- 
tioned extant betaretroviral proteins by conducting a 
tBLASTn analysis of the genomes using the CLC Main 
Workbench with the following parameters: BLOSUM62 
matrix, word size = 3, E-values < lxlO' 10 , gap costs of 
existence 11, extension 1, and low complexity filtered. 
We searched for genomic sequences with similarity to 
the betaretroviral transcripts identified in the bat 
transcriptomes by conducting a tBLASTx analysis of the 
genomes using the CLC Main Workbench with the fol- 
lowing parameters: BLOSUM80 matrix, word size = 3, 
E-values < 1x10' , low complexity filtered. To sort full- 
length from fragmented |3ERVs and various other 
retroelements within the BLAST output, a script was 
created using Microsoft Office Excel 2003 (Microsoft 
Corporation, Redmond, USA) that compared the BLAST 
data for the Gag, Pol, and Env analyses and identified 



scaffolds that emerged as a hit in each. The long ter- 
minal repeats (LTRs) which were used to delineate the 
full-length |3ERVs were identified by subjecting each 
identified gene scaffold to a BLASTn analysis in which 
the entire sequence was aligned with itself to identify re- 
peated sequences using the following parameters: Word 
size = 11, Match score = 1, Mismatch score = -3, gap 
costs of existence 5, extension 2, and low complexity 
filtered. 

Annotation of bat (JERVs 

Transcription promoter elements within the 5' LTRs of 
the |3ERVs were predicted using the online promoter 
predictor tool NNPP 2.2 [62] (http://www.fruitfly.org/ 
seq_tools/promoter.html). TATA boxes were predicted 
using the Hamming-Clustering method through the on- 
line HCtata tool [63] (http://zeus2.itb.cnr.it/~webgene/ 
wwwHC_tata.html). Poly(A) signal sites were predicted 
using the Hamming-Clustering method through the on- 
line HCpolya tool [63] (http://zeus2.itb.cnr.it/~webgene/ 
wwwHC_polya.html). Primer binding sites were identi- 
fied by an alignment of the genomic nucleotide sequence 
between the 5' LTR and the beginning of the gag gene 
of each [3ERV against the University of Strasbourg's on- 
line tRNA database [64] (http://trna.bioinf.uni-leipzig. 
de/DataOutput/Search) using the associated BLAST tool 
(default parameters). Open reading frames (ORFs) were 
identified within each [3ERV using the CLC Main Work- 
bench. The dUTPase protein domains and nucleocapsid 
zinc knuckles were identified by subjecting the trans- 
lated gag and pro genes to a protein family (Pfam) do- 
main search [65] through the CLC Main Workbench 
using the publicly accessible Pfam database (http://pfam. 
sanger.ac.uk/). The conserved major homology region 
(MHR) of Gag and enzymatic active sites of the retro- 
viral protease (DxG), reverse transcriptase (DDD), and 
integrase (DDE) were identified through a protein se- 
quence alignment, using the Create Alignment function 
of the CLC Main Workbench, between the Gag, Pro, 
and Pol of each bat (3ERV against those of the aforemen- 
tioned extant betaretroviruses. 

Prediction of RNA export elements 

NLS and NES were predicted by analyzing the Env, 
or if known, the Rev-like protein sequence of each 
betaretrovirus. NLS were predicted using the online 
tool cNLS mapper [66] (http://nls-mapper.iab.keio.ac.jp/ 
cgi-bin/NLS_Mapper_form.cgi) with a prediction score 
threshold of 3.0. NES were predicted using the online 
tool NetNES 1.1 [67] (http://www.cbs.dtu.dk/services/ 
NetNES/). The strength of each NES prediction within 
the Env/Rev-like protein is defined as strong if the 
scores for the neural network model and hidden Markov 
model, together with the overall NES score, are above 
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the algorithm-assigned threshold. The strength is weak 
if one of the scores is below the threshold. No NES is 
predicted for proteins in which more than one score is 
below the threshold. TBE, DR, and IR were identified by 
subjecting the nucleotide sequence within and down- 
stream of env ending at the poly(A) signal site within the 
3' LTR of each betaretrovirus to a BLASTn analysis in 
which the sequence was aligned against itself to identify 
repetitive elements using the following parameters: 
Word size =11, Match score = 1, Mismatch score = -3, 
gap costs of existence 5, extension 2, and low complexity 
not filtered. 

Sequence alignments 

All nucleotide and protein alignments were conducted 
using the Create Alignment function of the CLC Main 
Workbench except where stated otherwise. 

Phylogenetic analyses 

To determine the evolutionary relationships among the 
different bat betaretroviruses we inferred the phylogen- 
etic relationships among the Gag, Pol and Env amino 
acid sequences. All of the reference sequences were 
downloaded from NCBI (Additional file 2: Table S5) and 
aligned with bat sequences using MUSCLE [68]. We 
employed the Gblocks program [69] to remove regions 
of high sequence diversity and hence uncertain align- 
ment. Phylogenetic relationships were then inferred 
using the maximum likelihood (ML) method available 
in PhyML 3.0, employing SPR (subtree pruning and 
regrafting) branch-swapping [70] and incorporating 
1,000 bootstrap replications to determine the robustness 
of each node. The ProtTest 2.4 program [71] was used 
to select the best-fit model of amino acid substitution, 
which was found to be LG+I+r for all data sets. 

Molecular clock dating 

A time-scale for [3ERV evolution was established as de- 
scribed previously [36] and employing the Bayesian Markov 
chain Monte Carlo method (MCMC) available in the 
BEAST vl.7 package [72]. We first acquired the genomic 
substitution rates (R) for mega- and microbats. For this, di- 
vergence times of mega- and microbats were taken from 
the fossil record [28] and used to calibrate date estimates 
for the rest of the species tree, assuming an uncorrelated 
lognormal relaxed molecular clock. All phylogenetic trees 
were inferred using the GTR substitution model and the 
Yule speciation prior, and the BEAST analyses were run 
until all relevant parameters converged, with 10% of the 
MCMC chains discarded as burn-in. The estimated substi- 
tution rates were then used to calculate the age of each 
|3ERV using the following formula: T={DIR)I2, where T is 
the invasion time of each [3ERV (million years), D is the 
number of differences per site among the both 5' and 3' 



LTRs, and R is the genomic substitution rate (substitutions 
per site per year). 

Accession numbers 

The GenBank accession numbers of the retroviruses used 
in this study are listed in Additional file 2: Table S5. 

Additional files 



Additional file 1: Figure SI. Alignment of extant and bat betaretroviral 
primer binding sites (PBS). The PBS of bat endogenous betaretroviruses 
and those of known extant and exogenous betaretroviruses are aligned 
and grouped according to the specific lysine tRNA complementary to the 
PBS. The PBS complementarity of MIERV-f3E is uncertain. Figure S2. 
Alignment of the ORF present in the group VII endogenous 
betaretroviruses (PERVs) of bats. The region from the beginning of the 5' 
LTR to the beginning of the gag gene of each group VII bat pERV was 
aligned and a consensus sequence generated. The annotations belong to 
the consensus sequence and depict the 5' LTR, predicted promoter 
element and TATA boxes, the PBS complementary to tRNA Lys3 (Lys 3 
PBS), and an open reading frame (ORF). Figure S3. Annotated alignment 
of the group VIII endogenous betaretroviruses (PERVs) of bats. The region 
from the end of the env gene to the 3' long terminal repeat (LTR) of each 
group VIII bat PERV was aligned and a consensus sequence generated. 
The annotations belong to the consensus sequence and depict an open 
reading frame (ORF), the beginning of the 3' LTR, and mutations in 
PaERV-fiA and PvERV-(3K that influence the presence 
of ORFs. 

Additional file 2: Table SI. Comparison of BERV polymerase 
sequences to those of known betaretroviruses. Table S2. Identification of 
the target site duplications (TSD) flanking endogenous betaretroviruses. 
Table S3. Comparison of the 5' and 3' flanking regions of 
phylogenetically clustered pERVs. Table S4. Analysis of betaretroviral RNA 
export motifs. Table S5. GenBank accession numbers and Ensembl 
database locations of the retroviruses used in this study. 

Additional file 3: Figure S4. Unannotated alignment of the full proviral 
genomes of the group VIII endogenous betaretroviruses (PERVs) of bats. 

Additional file 4: Annotated sequences of PaERV- PA, PaPol-01, 
PaEnv-01, and RfEnv-01. 
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